Optimized stopping criteria for tree-based unit selection in concatenative synthesis
نویسندگان
چکیده
The lack of naturalness hampers the widespread application of speech synthesis. Increasing the size of the unit database in a concatenative speech synthesizer has been proposed as a method to increase the variety of units—thereby improving naturalness. However, expanding the unit database increases the computational cost of selecting the most appropriate unit and compounds the risk that a perceptually suboptimal unit is chosen. Clustering the unit database prior to synthesis is an effective method for reducing this cost and risk. In this study, a unit selection method based on tree-structured clustering of data is implemented and evaluated. This approach to tree construction differs from similar approaches used in both synthesis and recognition in that a “right-sized” tree is found automatically rather than using hand-tuned stopping criteria. The tree is grown to its maximum size, and its leaves are systematically recombined in order to determine the most suitable subtree. Trees are grown using the automatic stopping method and compared with those grown using thresholds. Cross validation shows that trees grown to their maximum size and systematically recombined produce fuller clusters with lower objective distortion measures than trees whose growth is arrested by a threshold. The study concludes with a discussion of how these results may affect the perceptual quality of a speech synthesizer.
منابع مشابه
Segment pre-selection in decision-tree based speech synthesis systems
Corpus based approaches to unit selection for concatenative speech synthesis have become popular in recent years due to their improved sensitivity to unit context over their more simple predecessors. These systems usually make use of large speech databases and employ sophisticated search algorithms to determine the optimal unit sequence to use to synthesise each sentence. For many applications ...
متن کاملDistance Mapping for Corpus-Based Concatenative Synthesis
In the most common approach to corpus-based concatenative synthesis, the unit selection takes places as a contentbased similarity match based on a weighted Euclidean distance between the audio descriptors of the database units, and the synthesis target. While the simplicity of this method explains the relative success of CBCS for interactive descriptor-based granular synthesis—especially when c...
متن کاملConcatenative Synthesis of Expressive Saxophone Performance
In this paper we present a systematic approach to applying expressive performance models to non-expressive score transcriptions and synthesizing the results by means of concatenative synthesis. Expressive performance models are built from score transcriptions and recorded performances by means of decision tree rule induction, and those models are used both to transform inexpressive input scores...
متن کاملComparing different stopping criteria for fuzzy decision tree induction through IDFID3
Fuzzy Decision Tree (FDT) classifiers combine decision trees with approximate reasoning offered by fuzzy representation to deal with language and measurement uncertainties. When a FDT induction algorithm utilizes stopping criteria for early stopping of the tree's growth, threshold values of stopping criteria will control the number of nodes. Finding a proper threshold value for a stopping crite...
متن کاملGenerating Script Using Statis the Context Variation
A statistical selection method is proposed for generating an optimized recording script for Concatenative Speech Synthesizer. This method starts with traveling a large text corpus to collect the statistical information of the Context Variation Unit Vectors (CVUV), which represent the multi-dimension phonetic contexts and properties of the synthesis unit. Each CVUV descriptor is organized as a n...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1998